NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Long Way to Go: Investigating Length Correlations in RLHF

Singhal, Prasann; Goyal, Tanya; Xu, Jiacheng; Durrett, Greg (October 2024, Proceedings of the Conference on Language Modeling (COLM))

Full Text Available
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

https://doi.org/10.18653/v1/2023.acl-long.650

Tang, Liyan; Goyal, Tanya; Fabbri, Alex; Laban, Philippe; Xu, Jiacheng; Yavuz, Semih; Kryscinski, Wojciech; Rousseau, Justin; Durrett, Greg (January 2023, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

The propensity of abstractive summarization models to make factual errors has been studied extensively, including design of metrics to detect factual errors and annotation of errors in current systems’ outputs. However, the ever-evolving nature of summarization systems, metrics, and annotated benchmarks makes factuality evaluation a moving target, and drawing clear comparisons among metrics has become increasingly difficult. In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models. Critically, our analysis shows that much of the recent improvement in the factuality detection space has been on summaries from older (pre-Transformer) models instead of more relevant recent summarization models. We further perform a finer-grained analysis per error-type and find similar performance variance across error types for different factuality metrics. Our results show that no one metric is superior in all settings or for all error types, and we provide recommendations for best practices given these insights.
more » « less
Full Text Available
Training Dynamics for Text Summarization Models

https://doi.org/10.18653/v1/2022.findings-acl.163

Goyal, Tanya; Xu, Jiacheng; Li, Junyi Jessy; Durrett, Greg (January 2022, Findings of the Association for Computational Linguistics: ACL 2022)

Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training time or how content selection and generation strategies are learnt across iterations. In this work, we analyze the training dynamics for generation models, focusing on summarization. Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process. We find that a propensity to copy the input is learned early in the training process consistently across all datasets studied. On the other hand, factual errors, such as hallucination of unsupported facts, are learnt in the later stages, though this behavior is more varied across domains. Based on these observations, we explore complementary approaches for modifying training: first, disregarding high-loss tokens that are challenging to learn and second, disregarding low-loss tokens that are learnt very quickly in the latter stages of the training process. We show that these simple training modifications allow us to configure our model to achieve different goals, such as improving factuality or improving abstractiveness.
more » « less
Full Text Available
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Srivastava, Aarohi; Rastogi, Abhinav; Rao, Abhishek; Shoeb, Abu Awal; Abid, Abubakar; Fisch, Adam; Brown, Adam R.; Santoro, Adam; Gupta, Aditya; Garriga-Alonso, Adri; et al (January 2023, Transactions on machine learning research)

Full Text Available

Search for: All records